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Abstract We present a new approach to rigid-body mo- 
tion segmentation from two views. We use a previously de- 
veloped nonlinear embedding of two-view point correspon- 
dences into a 9-dimensional space and identify the differ- 
ent motions by segmenting lower-dimensional subspaces. In 
order to overcome mixed and unknown dimensions of sub- 
spaces and nonuniform distributions along them we suggest 
the novel concept of global dimension and its minimization 
for clustering subspaces with some theoretical motivation. 
We propose a fast projected gradient algorithm for minimiz- 
ing global dimension and thus segmenting motions from 2- 
views. We develop an outlier detection framework around 
the proposed method, and we present state-of-the-art results 
on outlier-free and outlier-corrupted two-view data for seg- 
menting motion. 

Keywords Global Dimension • Empirical Dimension • 
Subspace Clustering • Hybrid-Linear Modeling • Motion 
Segmentation • Outliers • Robust Statistics 
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1 Introduction 

A classic problem in computer vision is that of feature- 
based motion segmentation from two views. In this prob- 
lem one has two images, taken at different times, of a 3D 
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scene. The scene is assumed to consist of multiple, inde- 
pendently moving rigid bodies. The goal is to identify the 
different moving objects and estimate a motion model for 
each one of them. For this purpose, one automatically tracks 
the locations of visually-interesting "features" in the scene, 
which are visible in both views (e.g., via Lucas Kanade type 
algorithm IT]). Each feature is represented as a pair of 2- 
vectors, holding the image coordinates of the feature in the 
two different views; such a pair is referred to as a point cor- 
respondence. The mathematical problem of feature-based, 
two-view motion segmentation is to both segment the point 
correspondences according to the rigid objects to which they 
belong, and estimate a motion model for each object. 

A basic strategy to solve the feature-based two- view mo- 
tion segmentation problem is to first cluster point correspon- 
dences and then estimate the single-body motions within 
clusters (well-known methods for single-body motion esti- 
mation are described in |12^,T5l). This procedure was sug- 
gested in 1101 , while clustering point correspondences with 
A"- means or spectral clustering, and in ll20l , while alternat- 
ing between clustering and motion segmentation via an EM 
procedure. Both clustering strategies of [lOJ and |,20J are 
based primarily on spatial separation between the clusters, 
however, different clusters in this setting may intersect each 
other (e.g., when motions share a symmetry). 

Due to this problem, some algebraic methods have 
been developed for directly solving for the motion param- 
eters, while eliminating the clustering of point correspon- 
dences 124,181 . Another solution is to segment feature tra- 
jectories by taking into account their geometric structures, 
which may be different than spatial separation (the feature 
trajectories in 2-views are the 4-dimensional vectors con- 
catenating the 2 point correspondences of the same feature 
from 2 views). Costeira and Kanade |7| showed that un- 
der the affine camera model, feature trajectories in «-views 
(for n>2 these are vectors of length 2n) within each rigid 
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body lie on an affine subspace of dimension at most 3. This 
observation has given rise to several feature-based motion 
segmentation schemes, which are based on clustering sub- 
spaces ||23]| ; we refer to such clustering as Hybrid Linear 
Modeling (HLM). For the more general and realistic model 
of the perspective camera, it can be shown that feature tra- 
jectories from two-views lie on quadratic surfaces of dimen- 
sion at most 3 (in M^) (see §2). 

Arias-Castro et al. [8] suggested clustering the quadratic 
surfaces of point correspondences (in M^) using Higher Or- 
der Spectral Clustering (HOSC) for manifold clustering. 
They demonstrated competitive results on the outlier-free 
database of 1 18 1, when assuming that the clusters are of di- 
mension 2. However, their results are not competitive for 
incorporating dimension 3 and they did not provide any nu- 
merical evidence that the dimension of the surfaces was 2 
and not 3. 

A different approach for clustering these particular 
quadratic surfaces can be obtained by embedding point cor- 
respondences into "quadratic coordinates" and then clus- 
tering subspaces. More precisely, if a point correspon- 
dence {{x,y),{x' ,y')) is mapped into {x,y,l) ^ {x' ,y' ,1) e 
M^, where (g) denotes the Kronecker product, then these 
quadratic surfaces are mapped into linear subspaces of di- 
mensions at most 8, which are determined by the funda- 
mental matrices ll2l[T5l l4l of the different motions. Chen et 
al. ID have used this idea for clustering such quadratic map- 
pings of point correspondences by the Spectral Curvature 
Clustering (SCC) algorithm |5| (they showed that instead 
of performing the actual mapping, one can apply the ker- 
nel trick). They claimed that other HLM algorithms (at that 
time) did not work well for such embedded data. 

The drawback of applying SCC to this quadratic map- 
ping of point correspondences in M^ is that SCC does not 
work well with subspaces of mixed dimensions, and the sub- 
space dimensions must be known a-priori. Unfortunately, 
the subspaces in this application have mixed and unknown 
dimensions (see fQ. What makes SCC successful for this 
application is the fact that it takes into account some global 
information of the subspaces (i.e., for li-dimensional sub- 
spaces it uses affinities based on arbitrary d + 2 points, and 
in particular, far-away points). This helps SCC deal with 
nonuniform sampling along subspaces with local structure 
very different than the global one (see O. On the other 
hand, local methods (e.g., f25l'6|) often do not work well 
in this setting. 

The purpose of this paper is to develop an HLM al- 
gorithm that can successfully cluster the quadratically- 
embedded point correspondences in M^. In particular, it ex- 
ploits the global structure of the underlying subspaces, i.e., 
their "dimensions". 

For this purpose, we propose a class of empirical dimen- 
sion estimators, and a corresponding notion of global dimen- 



sion for a mixture of subspaces (a function of the estimated 
dimensions of its constituent parts). We propose the global 
dimension minimization (GDM) algorithm, which is a fast 
projected gradient method aiming to minimize the global di- 
mension among all data partitions. We also build an outlier 
detection framework into this development to allow for cor- 
rupted data sets. We demonstrate state-of-the-art results for 
two-view motion segmentation (via quadratic embedding), 
both in the outlier-free and outlier-corrupted cases. We even 
show that these results are competitive with the state-of-the- 
art results for multiple-views, i.e., using all frames of a video 
sequence (obtained under the affine camera model). To mo- 
tivate the use of global dimension, we prove that for special 
settings and choice of parameters, the global dimension is 
minimized by the correct partition of the data (representing 
the underlying subspaces). We then discuss what to do in 
more general settings. 

The paper is organized as follows: f|2] briefly explains 
how the problem of 2-view motion segmentation can be for- 
mulated as a problem in HLM; ^introduces global dimen- 
sion and explains why its minimization can solve the HLM 
problem under some conditions; SHldevelops a fast projected 
gradient method for minimizing global dimension; fj5] de- 
velops an outlier detection/rejection framework for global 
dimension minimization; f|6]demonstrates numerical results 
on real-world 2-view data sets for both outlier-removed and 
outlier-corrupted data; finally, f|7] concludes this work. The 
appendix contains proofs of the key results in the paper. 



2 Formulating 2- View Motion Segmentation as a 
Problem in HLM 

One way of formulating the motion segmentation problem in 
terms of HLM is by exploiting the Affine Motion Subspace. 
Costeira and Kanade |T| demonstrated that when a set of 
features all come from a single rigid body, then under the 
assumptions of the affine camera model, the corresponding 
feature trajectories lie in an affine subspace of dimension 3 
or less. One can use this fact to partition the set of features 
by clustering their trajectories into subspaces. This is a pop- 
ular formulation of the segmentation problem, even when 
dealing with only two views of a scene. 

The formulation involving the affine motion subspace 
has the advantage that the feature trajectories tend to be 
nicely distributed in their respective subspaces, and the dif- 
ferent subspaces all have nearly the same dimensions. This 
formulation has the drawback that it requires an affine cam- 
era model. The affine camera assumption breaks down when 
viewing objects close to the camera, or when looking at ob- 
jects at significantly different ranges. The consequence of 
this is that the trajectories from a rigid body do not lie within 
a subspace, but rather in a manifold which is only locally ap- 
proximated by a subspace of dimension at most 3. 
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When dealing with 2-view segmentation, there is an- 
other approach, based on a more general camera model, 
which avoids this problem of distortion. This approach as- 
sumes a perspective camera, and relies on the fundamental 
matrix 1 12 1 for a rigid body. 

Indeed, if F = (Fij)^ ■ j is the fundamental matrix for a 
rigid body, and x/, — {x,y,l)^ and xj^ ~ {x' ,y' ,1)^ together 
form a point correspondence (in standard homogenous co- 
ordinates) from that body, then 



X,, Fx/, 



0, 



(1) 



which is algebraically equivalent to 

vec(F) ■v=Q, 

where 

vec(F) = {Fn,Fi2,Fii,F2uF22,F23,Fii,Fi2,Fi3y 
and 



V = {xx' ,x'y,x' ,xy' ,yy' ,y' ,x,y,lf = {x,y, if (g) {x',y',lf. 

We refer to the vector v as the nonlinear or Kronecker 
embedding of a point correspondence (recall that (g) is the 
Kronecker product). The vectors obtained through this non- 
linear embedding for feature points on the same rigid ob- 
ject lie in a linear subspace of M^ of dimension at most 8. 
Indeed, ([T]i says that there is a vector vec(F) G M^ orthogo- 
nal to all of the feature trajectories in this set (it also shows 
that the linear embedding (xj^jx',/ ) lies on a 3-dimensional 
quadratic manifold). However, the subspace dimension can 
decrease due to two different reasons. First of all, if there are 
very few points (per motion), then they may span a lower- 
dimensional subspace. The second cause is degeneracy in 
the 3D configuration of the features. If all world points and 
both camera centers live on a ruled quadratic surfacaM then 
their corresponding subspace has dimension 7 or less. In par- 
ticular, if all world points (but not necessarily the camera 
centres) are coplanar, the corresponding subspace will have 
dimension no larger than 6 (see [12 pg. 296]). Therefore, 
to make use of this embedding, the hybrid-linear modeling 
algorithm being employed must be tolerant of subspaces of 
mixed dimension. 

Since the perspective camera assumption is accurate in 
a much broader range of situations than the affine camera 
model, subspaces are more apparent with the nonlinear em- 
bedding than with the linear embedding. However, the non- 
linear embedding distorts the original sampling and results 
in lower-dimensional structures (of dimension at most 3) 
within the higher dimensional subspaces (of typical dimen- 
sions 6, 7 or 8), which is a serious obstacle for many HLM 
algorithms, especially ones using local spatial information. 

' A surface 5 is ruled if through every point of S there exists a 
straight line that lies on S. 




V*., 




Fig. 1: Two views of a 3D scene with features over- 
layed (left), and the nonlinearly embedded point cor- 
respondences in M'^, projected onto the 3-dimensional 
subspace spanned by their 3rd, 4th and 5th principal 
components (right). (Color figure online) 

3 Global Dimension 

From here on, we will be considering 2-view motion seg- 
mentation under the perspective camera model, i.e. using the 
the Kronecker embedding. We will present a global HLM 
method, which is well-suited for handling the data which 
results from this embedding. We begin our development by 
providing some intuitive motivation for our approach. 

Imagine that we have access to an oracle, who for any set 
of vectors in M^, can provide for us a good, robust estimate 
of the dimensionality of the se|^ Now, suppose we have a set 
of vectors which are sampled from a hybrid-linear distribu- 
tion. Consider a general partition of the data set, and define 
the "vector of set dimensions" for that partition to be the 
vector of oracle-provided approximate dimensions of each 
respective set in the partition. Our inspiration is the obser- 
vation that for most partitions one may happen upon, each 
set in the partition will typically contain points from many 
of the underlying subspaces. The associated vector of set 
dimensions will contain relatively large numbers, and the p- 
norm of this vector will be large. The /9-norm of the vector 
of set dimensions will be referred to as the global dimension 
of the partition. The best way to make the global dimension 
small, it would seem, is to try and decrease all of the ele- 
ments of the vector of set dimensions by grouping together 
vectors that come from common subspaces. This notion will 
be made precise and we will show, in fact, that under cer- 
tain conditions, the natural partition of the data set (the one 
where point assignment agrees with subspace affiliation) is 
a global minimizer of the global dimension function. 

Our approach to HLM will be to find the partition of 
a data set that yields the lowest possible global dimension. 
In this section, we develop the global dimension objective 



function in two parts. In [3.1 we suggest a new class of di- 
mension estimators that can perform the role of the oracle 

~ In a noiseless case this would return the dimension of the linear 
span of the set of vectors 
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in our discussion above. In i 3.2 we define global dimen- 
sion and explain why we expect its minimizer to reveal the 
clusters corresponding to the underlying subspaces. A fast 
algorithm for this minimization will be later described in fH] 



3.1 On Empirical Dimension 

We present here a class of dimension estimators depending 
on a parameter £ G (0, 1]. For it = {u\,. . . ,uii) andany /:> > 0, 
we use the notation ||m||p to mean (u'^ + . . . + u^Y'^ (even 
for p = e < 1 , where 1 1 • 1 1 p is not a norm). For a given set of A^ 
vectors in M^, {I'/j^p we denote by cr = (cJi (72 • • • OVad)^ 
the vector of singular values of the DxN data matrix A (the 
matrix whose columns are the data vectors). 

For e G (0, 1] the empirical dimension, denoted by 
de{vi,V2,.-.,VN) (or simply t/g) is defined by 



dE{vi,V2, 



,vn 



I(t^) 



(2) 



When £ = 1, this is sometimes called the "effective rank']^ 
of the data matrix ll22ll . 

The following theorem explains why dg is a good esti- 
mator for dimension. Put simply, it says two things. First, 
if we rotate and/or uniformly scale our set of vectors by 
a non-zero amount, then the empirical dimension of the 
set does not change. Second, in the absence of noise, em- 
pirical dimension never exceeds true dimension, but it ap- 
proaches true dimension in the limit (as the number of mea- 
surements goes to infinity) for spherically symmetric distri- 
butions. From now on we refer to ^/-dimensional subspaces 
as (i-subspaces. 

Theorem 1 For e e (0, 1], dg possesses the following prop- 
erties: 

1. de is invariant under dilations (i.e., scaling). 

2. dg is invariant under orthogonal transformations. 

3. If {f,}^! are contained in a d-subspace of MP, then 
de < d. 

4. If {i';}J^[ are i.i.d. samples from a sub-Gaussian 
probability measure, which is spherically symmet- 
ric within a d-subspace^ and non-degenerater\ then 
limAT^oo t/g (t; 1 , . . . , Vtq) = d with probability 1. 

^ "Effective rank" is sometimes defined differently. See 1191 . 

"^ A measure is spherically symmetric within a J-subspace if it is 
supported on this subspace and invariant to rotations within this sub- 
space. 

' A measure is non-degenerate on a subspace if it does not concen- 
trate mass on any proper subspace. In our setting the measure is also 
assumed to be spherically symmetric, and this assumption is equivalent 
to assuming the measure does not concentrate at the origin. 



To gain some intuition into the definition of empirical di- 
mension, consider taking a large set of samples from a spher- 
ically symmetric distribution supported by a li-subspace. 
Call the covariance matrix for this distribution Q. As the 
number of samples becomes large, the empirical covariance 
matrix approaches Q, which has the first d elements on the 
main diagonal all equal (call the value a?), and O's every- 
where else. The empirical dimension of the set of vectors 
involves the singular values of the data matrix, which are ap- 
proaching the square roots of the eigenvalues of Q. Hence, 
as the number of samples increases, we get: 



ds{vi,...,VN)- 



|j(a,a,...,a,0,...,0)||£ 



d^l^a 



,a,0,...,0)||(_^) d(^->^)l<^a 



= d. (3) 



Thus, for any value of e in (0, 1], the empirical dimension 
approaches the true dimension of the set as the number of 
measurements increases. 

If we look at a distribution that is not spherically sym- 
metric, but still supported by a li-subspace, then empirical 
dimension tends to under-estimate the true dimension of the 
distribution, even as the number of samples approaches in- 
finity. This is actually desirable behavior. If we take a spher- 
ically symmetric distribution in a li-subspace and imagine 
the process of collapsing it in one direction until it lies in 
a {d — l)-subspace, then true dimension behaves discontin- 
uously. The true dimension of a large set of samples will 
equal d until the collapsing is complete; at that point the di- 
mension will instantly drop to d — I. Empirical dimension 
smoothly drops from d to d— I during this collapsing pro- 
cess. It is in this setting that we see the necessity of the pa- 
rameter £. This parameter controls how quickly the empir- 
ical dimension drops from d to d — I in this process. More 
generally, a low value of e results in a "strict" dimension es- 
timator (meaning that it will not under-estimate dimension 
easily, even when distributions are asymmetric). When £ is 
large (approaching 1), empirical dimension is a lenient di- 
mension estimator. It is much more tolerant of noise, but it 
may consequently under-estimate the dimension of highly 
asymmetric distributions. The trade-off is that when deal- 
ing with noisy data or distributions only approximately sup- 
ported by linear subspaces, a stricter estimator can mistak- 
enly interpret noise or distortion as energy in new directions, 
thereby causing an over-estimate of dimension. Numerical 
experiments (e.g.. Fig. |2]i have shown that values of £ be- 
tween 0.3 and 0.7 seem to provide reasonable estimators, 
which tend to agree with our intuitive notion of dimension. 

In our application, since the perspective camera model 
is reasonably accurate (as opposed to the affine model), the 
nonlinear embedding of point correspondences results in 
subspaces with rather negligible distortion. We can thus af- 
ford a low value of £. In fact, this is needed because the data 
vectors are frequently distributed in very non-isotropic ways 
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Empirical Dimension of a collapsing set 
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Empirical Dim £^0.35 
Empirical Dim E=0.50 
Empirical Dim £^0.65 
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Time 



240 



Fig. 2: The experiment mentioned above is illustrated. 
A normally-distributed point cloud is created in M? and 
is slowly collapsed into a plane and then a line. One can 
see that if £ is close to 0, empirical dimension more 
closely tracks true dimension, resulting in a strict di- 
mension estimator. If e is close to 1, empmcal dimen- 
sion changes more smoothly, resulting in a lenient esti- 
mator (Color figure online) 

with this embedding. Thus, to avoid underestimating dimen- 
sion, we choose e = 0.35, which lies just slightly above the 
lowest value we confirmed for e (0.3). Notice that this value 
is not "tuned" to individual data sets, but is chosen based on 
the properties of the application as a whole and the nature of 
the embedding. 

3.2 On Global Dimension 

Assume we are provided a data set X in M^ (in our appli- 
cation D = 9 with the nonlinear embedding) and a partition 
of it n = (ni,n2,...,n^) for some keN (i.e., {n,}f^j are 
disjoint subsets of X whose union is X). We also assume that 
X Ues on a union of K subspaces and denote the "correct" 
(or natural) partition of the data (where each subset contains 
only points from a single underlying subspace) by FlNat- For 
a fixed £ G (0, 1], {t/c./jf^i are the empirical dimensions of 
the sets {n,}^^[. We seek to minimize a function based on 
these dimensions to recover FlNat- To this end we define 
global dimension (GD). When thinking of this function, we 
take the set of data vectors to be fixed and given, and we 
view GD as a function of partitions, 77, of the set of data 
vectors. For a fixed p £ {0-,°°) (we discuss the meaning of p 
later) we define GD as follows: 

/ K \ '/" 

GD(7T) = 11(4,, 42,..., 4^)^11,. = E^£,< • w 

Our strategy for recovering FlNar will be to try and find 
the partition of the data set that minimizes GD(7T). Intu- 



itively, by trying to minimize the p-norm of the vector of 
set dimensions, we are looking for a partition where all of 
the set dimensions are small. Imagine trying to minimize 
this objective function by hand, and starting with a partition 
close to, but not equal to FlNat- If there is a point assigned 
to the wrong cluster, then removing it from the set it is cur- 
rently assigned to should result in a significant drop in the 
dimension of that particular set. Re-assigning that point to 
the correct set, on the other hand, will have little impact on 
the dimension of the target set because the point will lie ap- 
proximately in the span of other points already in the set. 

Thus, such a change would cause a significant drop in 
one of the set dimensions, without disturbing the other sets, 
and the global dimension will decrease. This would suggest 
that amongst partitions that are close to it, FlNat yields the 
lowest global dimension. Additionally, if one considers a 
"random", or usual partition, then each set in that partition 
will tend to contain vectors from many different subspaces. 
Each set will have a large dimension, and the global dimen- 
sion will exceed that of FlNat- This would suggest that min- 
imizing global dimension may be a reasonable objective if 
we want to recover n^„,. 

Unfortunately, there can exist certain special partitions 
of a data set that result in low global dimension (in some 
cases even lower than that of FlNut)- For example, let us 
choose /:> = 1, so that the global dimension of a partition 
is simply the sum of the dimensions of its constituent parts. 
Now consider 3 lines in the plane, and a data set consisting 
of many points sampled from each line. In this case FlNat 
will consist of three sets. Each set will contain only points 
from a single line. The dimension of each set in FlNar is I. 
Hence, GD{nNat) — 3. On the other hand, if we consider the 
"degenerate" partition, that simply puts all points in a single 
set, then since we are in M?, the dimension of that set, and 
hence the global dimension of the partition, is 2. 

The above example is actually rather special. Consider 
the same data set, but set p to a large value instead of 1. 
When p is large, the global dimension approximately returns 
the largest value from {c/,}^;. Now consider minimizing 
this quantity, subject to the constraint that the partition con- 
tains no more than 3 sets. Minimizing global dimension in 
this setting penalizes partitions consisting of fewer, higher- 
dimensional sets instead of multiple, more balanced sets. 
Specifically, the global dimension of the degenerate parti- 
tion is again approximately 2, while the global dimension 
of riNat is approximately 1, since that is the maximum di- 
mension of its constituent sets. In fact, as shown in the next 
theorem, using large p effectively resolves the issue of spe- 
cial partitions yielding lower global dimension than Fl^at- 

We will consider the setting where we have K distinct 
linear subspaces of M^, each of dimension d < D. Call these 
subspaces {Lii}f^^. Assume we have a collection of non- 
degenerate measures {lJ-k}k=i supported by these subspaces 
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(so that fif: is supported by L/^, k — 1,2, ...,K). Let {v„}^^i 
be a set consisting of A^^^ i.i.d. points from each n^ (so 
N — Ni+N2 + .-. -\-Nk)- We require that A^^, > d for each k so 
that each subspace is adequately represented in the data set. 
Let GDrnie be global dimension for a fixed parameter p, de- 
fined using true dimension as the "dimension estimator" for 
a set. That is, GDrruein) = ||(t/r„,f(ni), ...,£/r„,e(17;f))||p 
where the sets FIii are the constituent sets of the partition 17 
and drniei*) returns the true dimension of its parameter set. 
Then, we get the following result: 

Theorem 2 Let {L^-jf^j, {^U^jf^j, {t^«}^=i satisfy the 
conditions above. If p > ln{K)/{ln(d + 1) — ln{d)), then 
amongst all partitions o/{t;„}^^[ into K or fewer sets, the 
natural partition is almost surely (w.r.t {pik]f=i) the unique 
minimizer ofGDjrue- 

The weakness of the above theorem is that it requires 
all of the intrinsic subspaces to have the same dimension. 
In practice, the global dimension objective function appears 
to be rather robust to subspaces with mixed dimensions. If 
there is a large difference in dimension between two sub- 
spaces in a dataset, then the minimum of global dimension 
tends to be be very near n^var, the only difference being that 
a few points from the higher-dimensional set are re-assigned 
to lower-dimensional sets to balance out the set dimensions. 

Theorem [2] gives us a quantitative way of selecting an 
appropriate value of p for our applications. Specifically if 
we want to be certain that we can handle up to 4 different 
sets with intrinsic dimensions up to 9 (the dimension of the 
ambient space in our application), then we would need to 
choose p> 1 3 . 1 6 for Theoreml2]to hold. In our experiments , 
we set p = 15 to give us a safety margin. 



4 A Fast Algorithm for Minimizing Global Dimension 

Global dimension is defined on the set of partitions of a data 
set. With a discrete domain, finding ways of quickly mini- 
mizing the objective function is non-trivial. In this section 
we briefly introduce a method, which we will call Global 
Dimension Minimization (GDM) for doing exactly this. 

GDM is based on the gradient projection method f?, 
§2.3]. In order to apply a gradient-based method, we need 
to re-formulate the problem so that we have a smooth objec- 
tive function over a convex domain. To do this we employ 
the notion of fuzzy assignment. Rather than trying to assign 
each data point a label, identifying it with a single cluster, we 
allow each point to be associated with every cluster simul- 
taneously, in varying amounts. Specifically, we assign each 
data point Vj a probabiUty vector where the f th coordinate 
holds the strength ofvfs affiliation with cluster /. Assuming 
we have a data set of A^ points in M^, and we seek K clusters, 
we need A^ probability vectors of length K to encode the soft 



partition of the data. This membership information will be 
stored in a membership matrix, M, where each column is a 
probability vector Element {i,j) of the matrix M holds the 
strength of v/s affiliation with cluster /. 

The next step is to extend the definition of global dimen- 
sion so that it is defined on soft partitions in a meaningful 
way. In its original formulation, to evaluate the global di- 
mension of a partition, we would break up the data set into 
parts, based on the partition, and estimate the dimension of 
each part using empirical dimension. To extend this to soft 
partitions, we estimate the dimension of the k'th set in a par- 
tition by scaling each data point by its respective affiliation 
strength to set k {v„ is multiplied by Ma„\). We then use 
empirical dimension to estimate the dimension of the scaled 
set. In essence, each point is now included in each dimen- 
sion estimate. However, if a point is scaled so that it lays 
near the origin when considering a given set, it has little im- 
pact on the estimated dimension of that set. In fact, if we 
look at the global dimension of a soft partition that assigns 
each data point entirely to a single set (M has only 1 's and 
O's in it), then the global dimension of that soft partition, us- 
ing our new definition, agrees with the global dimension of 
the corresponding "hard partition", using our original def- 
inition. Thus, this change is a reasonable extension of the 
original definition to soft partitions. Our extended definition 
of global dimension is: 



GD=\\{dl,d^,...,df 



(5) 



where d^^ = de{M (^i^^^-^vi, M (i,2)V2, ■.., M (^i^j^-^vn). 



With this modified formulation, global dimension is an 
almost-everywhere differentiable function defined over the 
Cartesian product of A^ /T-dimensional probability sim- 
plexes. One can check that this is a convex domain (the 
product of convex sets is convex). A natural approach to 
minimizing a problem of this sort is the gradient projection 
method [2, §2.3]. In this method, we begin at some initial 
state, compute the gradient of the objective function, take a 
step in the direction opposite the gradient, and then project 
our new state back into the domain of optimization. This is 
repeated until our state converges. 

The gradient of global dimension can be computed, but 
we need some notation first. For i ~ l... . ,K, we denote by 
Ai the D-hy-N matrix whose j'th column equals M^j^j-^Vj 
for i= 1,2,...,N {i.e., A]^ is the data matrix scaled according 
to weights for cluster k). Let A]^ = Ui:Sit{VkY be the thin 
SVD of Aii, and crii denote the vector of elements from the 
diagonal of Si^. Let 5 = e/(l — e). Define 
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Theorem 3 The derivative of global dimension w.r.t. an ar- 
bitrary element of the membership matrix M is given by: 



dCD 



dM, 



V 



(k,n) 



kin,-) 



■k\p-l 



(4) 



J^,)\\]->'D,{UtY)A(,,,y 



(7) 



A proof of Theoreml3]is included in the appendix. This the- 
orem allows us to evaluate the gradient vector of global di- 
mension. As was mentioned before, in an iteration of the 
gradient projection method we take a step in the direction 
opposite the gradient. Computing a good step size is fre- 
quently a challenging task, but here we are fortunate. Our 
domain has a meaningful natural scale, since it is formed as 
a product of probability simplexes. Intuitively, our step size 
should be large enough to move us across the entire space 
in a reasonable number of steps, but small enough that any 
individual membership vector can move only a fraction of 
the way across its own simplex in one step. In practice, we 
scale each step so that the membership vectors most affected 
by the step move a distance of .3 on average. This seems to 
work well in general. 

Finally, one can check that projecting onto the domain 
of optimization can be accomplished by individually pro- 
jecting each column of M onto the standard /T-dimensional 
probability simplex. 

We have outlined a variational method for minimizing 
global dimension. The above method forms the core of the 
GDM algorithm. However, since the global dimension func- 
tion is non-convex, it is important to achieve reasonably 
good initialization. Without going into details, this is done 
by starting with a trivial partition (each point in its own set) 
and repeatedly merging sets which keep the global dimen- 
sion small, until we have the desired number of clusters (in- 
spired by ALC 1 16 1). After initialization, the variational al- 
gorithm is run until convergence (or for a fixed, but large 
number of iterations). Thresholding is performed to recover 
a "hard partition" from our soft partition. After this is done, 
we perform a final genetic stage where we check to see if 
any one-point changes to the partition can improve global 
dimension. This cleans up small errors which may have oc- 
curred in any of the previous stages. Finally, we run this en- 
tire process several times and return the best partition of all 
runs (as measured by global dimension). 



Algorithm 1 GDM Algorithm for HLM 

Input: X = {x\,X2,--- ,xi^} C MP: data, K: number of clusters, p: 
global dimension parameter, e: empirical dimension parameter, n\, 
112, ny. number of iterations (default: n\ =n^ — 10, 02 = 30) 

Output: A partition, U, of X into K disjoint clusters 
for / = 1 : n\ do 

• n := Partition of X where each point is in its own set. 
while number of sets in H greater than K do 

• Randomly choose several pairs of sets. 

• For each pair, measure the effect on global diinension if the 
pair is merged. 

• Merge the pair of sets which results in the lowest global di- 
mension. 

end while 

• Convert 77 to a soft partition, encoded in membership matrix 
M. 

for j = I -.riT do 

• Compute gradient of global dimension, VGD. 

• Let p = average magnitude of largest 10% of columns of 
VGD. 

• Take a step in direction — 1 * VGD of length .3 /p. 

• Project each column of M onto the standard ^-dimensional 
probability simplex. 

end for 

• Convert M back to a "hard partition", 77, by thresholding. 
for J = 1 : m do 

for n = 1 : yv do 

• Check if re-assigning point n to some other cluster de- 
creases global dimension. 

• If so, re-assign point n to that cluster 
end for 

end for 
end for 

• Of the partitions found in each of the above runs, return the one 
with lowest global dimension. 



iteration. The initiaUzation of the algorithm via ALC-type 
procedure fTBl requires (9(«i -N ■\og{N) ■ D^) operations. 
Also, the last genetic step has the following complexity 
0{ni ■ n^ ■ K ■ D^ ■ N^) (without taking advantage of incre- 
mental SVD). In theory, we can make the algorithm linear 
in the number of points A^, by randomly initializing it, re- 
moving the genetic "clean-up" step and changing the choice 
of p . We have good numerical evidence, even with large A^, 
that this can result in good accuracy and speed for artificial 
data. Regardless, for the values of A^ in our application the 
algorithm is sufficiently fast and these additional steps help 
improving accuracy, especially for points which are nearby 
several clusters (whose percentage is not negligible when A^ 
is small). 



4.1 Complexity of GDM 

A thorough analysis of the computational complexity is not 
included here; this is a short summary of the computational 
aspects involved. The main numerical component of GDM 
is computing VGD. For a single iteration its complexity is 
0{K-N -D^). Our choice of p requires a sorting procedure 
and is thus of order 0{N ■ \og{N)) operations for a single 



5 Detecting and Rejecting Outliers with GDM 

In practice, it turns out that the GDM algorithm described 
above is naturally robust to a small number of outliers (in 
that they do not tend to affect the classification of inliers), 
but no instruments were put in place for explicitly detecting 
or rejecting these outlying points. In this section, we intro- 
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duce a modification to GDM that allows for explicit outlier 
detection and rejection. The guiding intuition is that an out- 
lier has the property that if the true hybrid-linear structure 
is reflected in a partition, then no matter which group we 
assign the outlier to, it causes a significant increase in the 
empirical dimension of that group. This, in turn, results in a 
significant increase in global dimension. In other words, if 
we have a partition that reflects the true hybrid-linear struc- 
ture of the data set, then there is no good place to put an 
outlier If the algorithm was given the option of paying a 
fixed, low price for the right to ignore a given point, it would 
make sense for it to exercise this option on outliers, and only 
segment inliers. 

We propose modifying the global dimension objective 
function, and the accompanying variational development in 
the following way: 



GD(M) = a||Mi,||i + ||(j£,2,43:--:'^V/f+i 
where: 

d^,k = de (Mi,,iVi,Mk2V2,---,Mk^NVN) ■ 



(8) 



(9) 



This modification adds an additional "cluster" to the 
problem (call it cluster 1), and we treat it differently than 
the others. Clusters 2 through K +1 contribute to the global 
dimension in the same way that they did in the original de- 
velopment. Cluster 1 contributes to the cost function the sum 
of the membership strengths of all data points to this cluster. 
This is the "fuzzy assignment" version of the following no- 
tion: we allow the algorithm to pay a fixed price, a, for the 
right to ignore any particular data point (not assign it to any 
true cluster). 

5.1 Modification to GDM 

The proposed modification to the objective function only 
trivially changes the state space (now it is the product of 
N K+ 1 -dimensional probability simplexes, as opposed to 
A'-dimensional simplexes). Thus, our method of projecting 
states onto the convex domain is effectively the same. The 
change to the objective function does mean that we must re- 
evaluate the gradient of global dimension. The computation 
is very similar to the unmodified version, and the result is: 

dGD 

dm" 



(10) 



and, for all fc > 1 






(11) 

where the notation and constants are as defined in f|4] Thus, 
the necessary modifications to GDM are: 



1. Update the evaluation of the objective function GD ac- 
cording to ([8]). 

2. Update the initialization of the state vector to include an 
outlier group. 

3. Update the state projection routine to accommodate ad- 
ditional dimensions in domain. 



4. Update the evaluation of VGD according to 10 and 1 1 



5.2 Practical Implementations of Outlier Rejection 

We have described an idea for how to handle outliers, but 
it introduces a new parameter, a. It is not immediately clear 
how one should choose this parameter, and how sensitive the 
results will be to it. In theory one would need to choose an 
outlier cost, a, that is not so high that nothing is ever as- 
signed to the outlier group, but not so low that large quanti- 
ties of inliers are assigned to this group. The appropriate val- 
ues would Ukely depend on multiple quantities, like intrinsic 
dimension, noise level, and distortion of the underlying sub- 
spaces. These are quantities that can vary not just between 
applications, but also from data set to data set for a single 
application. Applying the suggested modification exactly as 
proposed (and trying to "tune" this parameter) would there- 
fore lead to an unreliable and unpredictable algorithm. We 
refer to this approach as GDM-Naive, and Figure [3] illus- 
trates why this method is unsound. Instead, we propose two 
variations of this method, which lead to more reliable solu- 
tions. 

1. GDM Known-Fraction: Run the proposed algorithm 
with a fixed, low value of a (we use a =0.01) but stop 
before the threshold step. Rank the data points accord- 
ing to their membership strengths to the outlier group. 
Remove a pre-set fraction of the data set (the part that 
most strongly affiliates with the outlier group). Continue 
with the classic (non-outlier version) of the variational 
algorithm on the surviving points onljrl - this provides 
the inlier segmentation. The points that were removed 
are labelled outliers. 

2. GDM Model-Reassign: Run method [T] above (GDM 
Known-Fraction). Fit subspaces of appropriate dimen- 
sion (round the empirical dimension) to each set in the 
resulting partition. Re-assign all points (including those 
that were decided to be outliers) according to their dis- 
tances from each subspace. Call a point an outlier if it 
is more than some fixed distance, K, from all of the sub- 
spaces. 

Each of the proposed methods handles the task of select- 
ing a, but introduces a new parameter. For method [T] this is 

^ We could skip this step and segment directly from the fuzzy as- 
signment that we already have. Refining the membership matrix after 
removing the outliers is done to repair whatever damage the outliers 
may have done to the membership matrix before thresholding. 
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Outlier Group 



Outlier Group 



Outlier Group 




Cluster A 



Clusters Cluster A 



Clusters Cluster A 



Cluster B 



Large a: Nothing 
ends up in outlier 
group. 



Medium a: Only 
outliers end up in 
outlier group. 



Low a: Some in- 
liers end up in out- 
lier group. 



The three images here illustrate the problem with 
GDM-Naive. Each triangle represents the prob- 
ability simplex containing the fuzzy assignment 
vectors for a fictitious data set. The fuzzy assign- 
ment for each point is plotted after many itera- 
tions of GDM. Points in red are inliers and points 
in green are outliers. The quantization regions 
(for the threshold step) are numbered 1-3. One 
can see that if a is not chosen correctly, points 
can be quantized into the wrong cluster. On the 
other hand, the outlier ranking of a point (the # 
of points closer the the outlier corner) is a more 
stable quantity. (Color figure online) 



Fig. 3: Graphical depiction of the problem with GDM-Naive 



the percentage of the data set to throw out. For method l2j 
the new parameter is the maximum distance a point can be 
from a subspace to be considered an inlier Both of these 
parameters are more natural than selecting a. In a noisy en- 
vironment, one may have an idea, based on experiments, of 
what percentage of the data set will be outliers, or what the 
inlier modelling error tends to be. Additionally, when us- 
ing the "Model-Reassign" method, one could find the aver- 
age and variance of the residuals, /i, and a^ respectively, 
when fitting subspaces to the inlier clusters. These quanti- 
ties can be used to come up with a reasonable value of K 
for a given application (/z + ra for some r). One could also 
find these values on a per-cluster basis and have a different 
outlier threshold for each cluster 



which there was a code available online): RAS |18| and 
HOSC |8|. Algorithm parameters and our experiment pro- 
cedure are detailed in 98.41 
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Fig. 4: Clustering by SCC, GDM, and MAPA on file 6 
of the outlier-free RAS database. (Color figure online) 



6 Results on Real- World Data 



6.1 Performance in the Absence of Outliers 



We tested the GDM algorithm on 2 motion segmentation 
databases. First, we used the outlier-free RAS database ifTSl 
m and compared with many leading methods in 2-view seg- 
mentation. We noticed that some of the HLM methods per- 
formed better when using the linearly embedded point corre- 
spondences than with the nonlinear embedding. Therefore, 
in Table [T] we present each of the competing HLM algo- 
rithms twice. Where "Linear" appears, the algorithm was 
run on the feature trajectories in M^. Where "Nonlinear" ap- 
pears, the algorithm was run on the Kronecker products (in 
M^) of the standard homogeneous coordinates of each fea- 
ture correspondence. Figure |5] presents more details on the 
performance of the HLM methods with the nonlinear em- 
bedding, and Table [2] gives the average runtimes of these 
methods. The other HLM methods we included are SCC |5 1, 
MAPA |6|, SSC [9], SLBF [25 1, and LRR |14|. We also 
included two other successful methods for two-views (for 
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Fig. 5: GDM is compared against other HLM methods 
on the nonlinear 2-view embedding of the outlier-free 
RAS database. (Color figure onHne) 

From Table [T] and Figure [5] we can see that GDM per- 
forms very competitively on this database. There is only a 
single file (#8) on which GDM exhibits significant error. 
This file contains features from two bent magazines as well 
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Table 1: Misclassification Rates (given as % Error) on the outlier-free RAS database. 







File Number 


Average 


Average 
w/o File #8 




1 


^ 


i 


4 


5 





7 


« 


9 


10 


11 


12 


13 


OX) 

s 
■5 

■s 

B 

o 
at 


GDM Nonlinear 


0.85 


0.00 


1.57 


0.65 


0.00 


0.00 


0.00 


12.76 


0.00 


0.00 


0.00 


0.00 


0.00 


1.22 


0.26 


sec Linear 


0.85 


0.00 


1.18 


0.65 


0.00 


1.37 


0.00 


1.42 


0.39 


0.00 


0.00 


1.01 


0.00 


0.53 


0.45 


sec Nonlinear 


0.85 


0.00 


24.41 


0.00 


0.00 


19.18 


0.00 


0.00 


0.00 


13.97 


5.36 


0.84 


1.10 


5.05 


5.48 


MAPA Linear 


0.85 


3.65 


1.18 


0.65 


0.00 


13.70 


15.97 


1.29 


0.00 


0.00 


0.00 


0.67 


3.30 


3.17 


3.33 


MAPA Nonlinear 


0.85 


20.55 


21.65 


0.65 


0.00 


21.92 


6.25 


7.73 


0.00 


13.97 


1.43 


0.34 


3.30 


7.59 


7.57 


SSC Linear 


1.69 


18.26 


0.79 


1.94 


0.00 


0.00 


6.25 


32.22 


0.00 


0.00 


14.64 


1.35 


4.40 


6.27 


4.11 


SSC Nonlinear 


1.27 


0.00 


22.44 


0.65 


0.00 


21.92 


0.00 


9.02 


0.00 


13.97 


9.29 


12.12 


6.59 


7.48 


7.35 


SLBF Linear 


0.85 


0.46 


1.18 


0.65 


0.00 


0.00 


0.00 


0.26 


0.00 


0.00 


0.00 


0.67 


4.40 


0.65 


0.68 


SLBF Nonlinear- 


0.85 


0.00 


5.12 


1.94 


0.00 


19.18 


0.00 


10.57 


0.00 


13.97 


0.00 


1.68 


14.29 


5.20 


4.75 


LRR Lniear 


5.08 


24.66 


1.18 


2.58 


2.38 


2.74 


0.00 


29.12 


0.00 


0.00 


8.93 


14.81 


18.68 


8.47 


6.75 


LRR Nonhnear 


1.27 


9.13 


2.76 


1.94 


0.00 


0.00 


3.47 


3.61 


0.00 


0.00 


9.64 


18.18 


2.20 


4.02 


4.05 


RAS 


11.65 


0.00 


2.56 


9.68 


16.19 


26.03 


26.74 


11.21 


3.28 


13.97 


3.21 


2.36 


6.59 


10.27 


10.19 


HOSC d=2 


0.85 


0.00 


24.41 


1.61 


0.00 


0.00 


0.00 


22.94 


0.00 


0.00 


0.00 


0.00 


2.20 


4.00 


2.42 


HOSC d=3 


1.27 


23.74 


24.41 


3.23 


0.00 


19.18 


12.15 


19.59 


23.75 


0.00 


1.43 


1.01 


17.58 


11.33 


10.65 



Table 2: Average runtimes (per file) of HLM-based 
methods on non-linearly embedded (outlier-free) RAS 
data. 







Runtime (seconds) 


1 

a; 


GDM 


12.7 


sec 


2.3 


MAPA 


5.6 


SSC 


89.5 


SLBF 


4.0 


LRR 


0.8 



as a rigid background. Since the bent magazines are clearly 
non-rigid, our model assumptions are not met (see Fig. [6]). 
There were two methods in the comparison that had a lower 
average misclassification error than GDM ("SCC Linear" 
and "SLBF Linear"). This is because they perform signifi- 
cantly better on file (#8). Both of these are spectral meth- 
ods, accompanied by the linear embedding, and are there- 
fore better able to handle the manifold structure that results 
from the non-rigidity of the objects in this file. Amongst 
the other files however, GDM performs better on average 
than both of these two methods (see the last column of 
Table [T]). Comparing just the HLM-based methods on the 
nonlinearly-embedded data, GDM performs better than any 
other method, with the most perfect classifications and the 
fewest number of files with significant errors. Figure ISlmore 
clearly emphasizes this superb performance amongst meth- 
ods using the nonlinear embedding. 

We also performed experiments on the Hopkins 155 
database ll2Ti . For 2-view segmentation we extracted the 
first and last frame of each sequence and performed 2-view 
segmentation on the nonlinear embedding (in M^) of the 
data. For comparison, we demonstrate the results of some 
other HLM algorithms on this embedded data: MAPA Q, 
SCC-MS I.5..25J and SLBF-MS |25|. We also supply results 
for a few state-of-the-art HLM methods on the full n-view 
feature trajectories. For these «-view results we chose in 
this table the best methods on Hopkins 155 we are aware 
of, which do not require careful tuning with parameters; 




(a) Frame 1 



(b) Frame 2 



Fig. 6: File 8 in the RAS database. This is a problem- 
atic file because the two magazines in the scene appear 
to undergo a non-rigid transformation between the two 
frames. Point correspondences are colored according to 
ground-truth segmentation. (Color figure online) 



SSC 13 and SLBF-MS El. We also include the refer- 
ence (REF) results 11211 . REF finds the best linear models 
(via least squares approximation) for each cluster of embed- 
ded points (given the ground truth segmentation), and then 
finds new clusters by assigning points to the models they 
best agree with. For GDM on this database, it was neces- 
sary to increase the number of random initializations (ni in 
Algorithm 1) to achieve reliable convergence (we changed 
it from 10 to 30). From Table [3] we see that GDM outper- 
forms the other 2-view methods (although SCC matches or 
nearly matches its performance in some categories). We re- 
mark that we also tested a genetic algorithm for minimizing 
the global dimension and it achieved even more accurate re- 
sults, however, we do not include it here since it is not as 
fast as GDM. 

It is also interesting to note that our results for 2-views 
are comparable to the reference results with «-views. That 
is, the results of GDM are the best one can expect with pure 
Unear modeling given many views and assuming affine cam- 
era model. GDM for «-views gave comparable results and 
we thus did not include it. On the other hand, both SLBF- 
MS and SSC-N are able to obtain better results with «-views 
and this is because their machinery of spectral clustering (to- 
gether with good choices of spectral weights) is able to take 
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Table 3: The mean and median percentage of misclassified points for two-motions and three-motions in Hopkins 155 
database with comparisons to state-of-the-art n-views. Winning resuhs amongst the 2-view methods are bold-faced in 
each category. 





2-motion 


Checker 


Traffic 


Articulated 


All 1 




Mean 


Median 


Mean 


Median 


Mean 


Median 


Mean 


Median 




GDM 


2.79 


0.00 


1.78 


0.00 


2.66 


0.00 


2.51 


0.00 


MAPA 


12.85 


14.07 


6.49 


6.93 


7.15 


5.33 


10.69 


10.03 


sec (d=7) 


2.79 


0.00 


1.97 


0.00 


3.42 


0.00 


2.64 


0.00 


SLBF (d=6) 


8.18 


1.39 


3.98 


0.53 


4.73 


0.40 


6.78 


1.11 


1 

St 


SLBF-MS (2F,3) 


1.28 


0.00 


0.21 


0.00 


0.94 


0.00 


0.98 


0.00 


SSC-N (4K,3) 


1.29 


0.00 


0.29 


0.00 


0.97 


0.00 


1.00 


0.00 


REF 


2.76 


0.49 


0.30 


0.00 


1.71 


0.00 


2.03 


0.00 





3 -motion 


Checker 


Traffic 


Articulated 


All 1 




Mean 


Median 


Mean 


Median 


Mean 


Median 


Mean 


Median 




GDM 


5.37 


3.23 


4.23 


2.69 


5.32 


5.32 


5.14 


3.13 


MAPA 


21.89 


19.49 


13.15 


13.04 


9.04 


9.04 


19.41 


18.09 


sec (d=7) 


8.05 


5.85 


4.67 


5.45 


5.85 


5.85 


7.25 


5.45 


SLBF (d=6) 


14.08 


12.80 


7.93 


6.75 


4.79 


4.79 


12.32 


9.57 




SLBF-MS (2F,3) 


3.33 


0.39 


0.24 


0.00 


2.13 


2.13 


2.64 


0.22 


SSC-N (4.S:,3) 


3.22 


0.29 


0.53 


0.00 


2.13 


2.13 


2.62 


0.22 


REF 


6.28 


5.06 


1.30 


0.00 


2.66 


2.66 


5.08 


2.40 



into account some of the manifold structure and nearness of 
points. That is, they use information beyond linear model- 
ing. 



6.2 Performance in the Presence of Outliers 



We tested the methods suggested in S5.2 on the outlier- 
corrupted RAS database |18|. The performance of classic 
GDM (no outlier rejection machinery) is also presented on 
this database, as is the performance of GDM on the cor- 
responding outlier-free database (for comparison purposes). 
We also show results from three competing methods for seg- 
menting motion with outhers: RAS il8j, HOSC ID, and 
LRR II 1411 131 with outlier rejection performed by identify- 
ing the largest columns of E, as suggested in [13 pg. 9]. 
The details of this experiment, including parameter values. 



are given in S 8.4 



It is non-trivial to fairly compare different algorithms in 
the presence of outliers. Each method generally has at least 
one parameter for controlling how it handles outliers. This 
parameter balances the desire for a high outlier detection 
rate with a desire for a low false alarm rate (these two quan- 
tities are invariably correlated). Using any popular metric 
for evaluating segmentation accuracy (like misclassification 
rate for true inliersji, the performance of each algorithm will 
depend substantially on its outlier handling parameter. In 
general terms, if an algorithm is allowed to discard points 
as outliers more freely, then the accuracy on the surviving 
points will improve. Thus, if one method is more conserva- 

' "True inliers" are points that are inliers according to ground truth. 



tive than another in discarding points as outliers, the results 
will likely be skewed in favor of one method over the other. 
It is therefore important when looking at segmentation ac- 
curacy to think in terms of accuracy for a given true positive 
rate (TPR) and false positive rate (FPR): 



# of outliers that were identified as outliers 
TPR = * 100, 

# of outliers in dataset 

# of inliers that were identified as outliers 

PP^ # of inliers in dataset * '''■ 



There are two aspects of these algorithms we wish to 
compare. The first is outlier detection performance (how 
good is each method at distinguishing between inliers and 
outliers). The second is segmentation performance, where 
we evaluate how good each method is at segmenting mo- 
tions in the presence of outliers. 

To compare the outlier detection performance of multi- 
ple methods, a common tool is the ROC curve, which para- 
metrically plots the TPR vs. FPR as a function of the out- 
lier parameter for a method. A "random classifier" that ran- 
domly labels points as inliers or outliers will have an ROC 
curve lying along the line TPR = FPR. An ideal classifier 
will follow the line TPR = 1 . Hence, methods can be com- 
pared by seeing which ROC curve is highest over the broad- 
est range of FPRs (or over the FPRs one is interested in). 
The ROC curves for GDM (using the Model-Reassign out- 
lier detection method and varying k), LRR (by varying A), 
RAS (by varying "outlierFraction"), and HOSC (by varying 
a), are presented in Fig.|7] 
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Table 4: Misclassification Rates (given as % Error) of inliers on the RAS database. All but 'Classic GDM - clean' are misclassification 
rates when run on the outlier-corrupted datasets. 'GDM - clean' gives the performance of the unmodified GDM algorithm, when run on the 
outlier-removed datasets (included as a reference). 







File Number 


Average 


Average 
w/o File #8 




1 




-< 


4 


i 


" 


' 


8 


y 


10 


11 


12 


13 


1 


GDM - Model-Reassign 


2.97 


0.00 


4.33 


1.29 


0.95 


0.00 


0.00 


12.63 


0.00 


6.62 


0.00 


2.02 


17.58 


3.72 


2.98 


GDM - Classic 


0.85 


0.00 


1.57 


32.26 


0.00 


0.00 


0.00 


22.94 


0.00 


0.00 


22.14 


8.75 


16.48 


8.08 


6.84 


RAS 


19.49 


5.02 


1.97 


5.81 


15.71 


23.29 


25.00 


11.86 


2.32 


13.97 


12.14 


18.18 


21.98 


13.60 


13.74 


LRR 


4.24 


20.55 


22.83 


7.10 


7.14 


8.22 


18.75 


34.54 


2.32 


27.21 


8.57 


11.78 


25.27 


15.27 


13.67 


HOSC (d=2) 


11.02 


22.37 


16.54 


33.55 


10.95 


2.74 


11.11 


11.34 


3.09 


13.97 


36.79 


66.67 


8.79 


19.15 


19.80 


GDM - clean 


0.85 


0.00 


1.57 


0.65 


0.00 


0.00 


0.00 


12.76 


0.00 


0.00 


0.00 


0.00 


0.00 


1.22 


0.26 



GDM was again run using the nonlinear embedding of 
the data. HOSC was run with the Hnear embedding and LRR 
was run with the nonhnear embedding since these were the 
cases that yielded the best performance in the outlier-free 



tests for each algorithm (see ^8.4 for more details). From 



Fig. ITlwe can see that GDM is very competitive at detecting 
outliers on this database. At low FPRs GDM yields compar- 
atively excellent performance. At higher FPRs HOSC has 
a moderate advantage at outlier detection v.s. GDM. How- 
ever, it will be seen later (Table|4]l that HOSC is not competi- 
tive at segmentation in the presence of outliers. Furthermore, 
the presented HOSC results were prepared using d ~2 (see 
^8.4 1, instead of li = 3 as argued for by its authors. Using 
c/ = 3 gave worse results and made the algorithm take an 
extremely long time to execute. 
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select "reasonable" parameters for each method, which cor- 
respond to the same general region of ROC space. It should 
be understood that since the TPR and FPR cannot be con- 
trolled exactly for each method, any such comparison is in- 
herently unfair, and by manipulating outlier parameters the 
results can be skewed somewhat in any direction. 

For the purpose of fairly comparing GDM with other 
methods, we must select only one of the suggested outlier 
detection schemes for GDM ("GDM - Known Fraction" 
or "GDM Model-Reassign"). To effectively use "GDM - 
Known Fraction", one must either know roughly what frac- 
tion of his or her data are going to be outliers, or be in a sit- 
uation where over-rejecting points as outliers is acceptable 
(you can then over-estimate the outlier fraction). Since this 
is not usually the case, we will consider the results of "GDM 
Model-Reassign" when comparing with other methods. 

In Table |4] we present a file-by-file comparison of seg- 
mentation accuracy for the aforementioned methods using 
parameters that place the FPR of each method in the range 
of 0.01 to 0.08. Table |5] reports the average TPR and FPR 
for each of these methods. 

Table 5: True Positive Rate (TPR) and False Posi- 
tive Rate (FPR) for each method in our segmentation 
comparison in Table |4] GDM - Model Reassign, RAS, 
LRR, and HOSC were each tuned to achieve a false 
positive rate in the range of 0.01 to 0.08. 



0.2 0.4 0.6 0.8 
False Positive Rate 







TPR 


FPR 


o 


GDM - Model-Reassign 


0.56 


0.01 


GDM - Classic 


NA 


NA 


RAS 


0.74 


0.08 


LRR 


0.49 


0.04 


HOSC 


0.71 


0.06 


GDM - clean 


NA 


NA 



Fig. 7: The outlier detection performance of GDM 
(Model-Reassign) is compared against other mo- 
tion segmentation methods on the outlier-free RAS 
database. (Color figure online) 



The TPR and FPR for a robust segmentation algorithm 
cannot generally be controlled independently or arbitrarily. 
Thus, for a comparison of segmentation accuracy, one must 



One can see from Table |4]that "GDM Model-Reassign" 
causes an overall improvement in segmentation accuracy 
(vs "GDM - Classic") in the presence of outliers. There 
were several files where the outUers cause the classic GDM 
method to misclassify large fractions of the data sets (files 
4, 8, and 1 1 have inlier misclassification rates over 20%). 
On these files the error rates of "GDM Model-Reassign" 
are dramatically lower. There are some files where the out- 



A New Approach To Two- View Motion Segmentation Using Global Dimension Minimization 



13 



Her detection framework appears to hurt performance, but in 
most of these cases the degradation is slight. The results are 
competitive with (and in most cases better than) RAS, and 
both RAS and GDM are superior over LRR and HOSC in 
this comparison. Unlike the strong outlier detection perfor- 
mance of HOSC discussed earlier, the segmentation capa- 
bilities of HOSC appear very intolerant to outliers (if even 
a few outliers slip through, segmentation performance suf- 
fers). GDM and RAS both had certain files in our compari- 
son where they were clearly dominant over the other, but on 
average GDM performed the strongest. 



7 Conclusions 

We presented a new approach to 2-view motion segmenta- 
tion, which is also a general method for HLM. Its devel- 
opment was motivated by the main obstacles of recovering 
multiple subspaces within the nonlinear embedding of point 
correspondences into M^. The first obstacle is due to nonuni- 
form distributions along subspaces and the second one is due 
to unknown dimensions of subspaces. The idea was to min- 
imize a global quantity, i.e., global dimension, which does 
not make an a-priori assumption on the dimensions of the 
underlying subspaces. We formulated a fast method to min- 
imize this global dimension, which we referred to as GDM. 
We demonstrated state-of-the-art results of GDM for 2-view 
motion segmentation. 

We carefully explained the meaning of the two main pa- 
rameters in our algorithm, p and e, and the trade-offs they 
express. We gave a theoretical basis for selecting an appro- 
priate value of p. Needless to say that these parameters are 
fixed throughout the paper. We described a preliminary the- 
ory which motivated the notion of global dimension, and we 
justified why it makes sense as an objective function in our 
application. 

Finally, we presented an outlier detection/rejection 
framework for GDM. We explored two complimentary im- 
plementations of this framework, and we presented results 
demonstrating that it is competitive at handling outliers in 
this application. 



8 Appendix 

8.1 Proof of Theorem [T] 

We prove the four properties of the statement of the theorem. 
For simplicity we assume that D < N. That is, the number 
of data points is greater than the dimension of the ambient 
space. This is the usual case in many applications. 



a y^ Q results in scaling all the singular values of the cor- 
responding data matrix by a. Furthermore, this results in 
scaling by a both the numerator and denominator of the ex- 
pression for the empirical dimension for any e > 0. There- 
fore, the empirical dimension is invariant to this scaling. 

Proof of Property 2: The singular values of a matrix (in par- 
ticular the data matrix) are invariant to any orthogonal trans- 
formation of this matrix and thus the empirical dimension is 
invariant to such transformation. 

Proof of Property 3: If {vi}f^i are contained in a d- 
subspace, then since these form the columns of A, 
rank{A) < d. Since U and V are orthogonal, rank{A) = 
rank{S). In particular, A has at most d singular values. Let 
cr be the vector of singular values of A, and let 1 „• be the 
indicator vector of a^ 

The generalized Holders Inequality [11. pg. 10] states 
that if: 



/'i,P2e(0,H and 



then 



1 



1 1 



P\ P2 



(12) 



||/i/2||r < ||/i||pill/2||p2 for any functions /i and/2. (13) 

To apply this result to vectors, we view them as functions 
over the set {1,2, ...,D} with counting measure. 

Let pi ^l, p2^ j^, r = e. Also let /i = l^, fi = cr. 
These values satisfy ([T2|i. We therefore get: 



(T E 

" 1-e 



< II 1(t|| 1 = (# of non-zero sing, values of A) < d. 

(14) 



Proof of Property 4: By hypothesis, the data vectors {fijjlj 
are i.i.d. and sampled according to probability measure jj., 
where jj. is sub-Gaussian, non-degenerate, and spherically 
symmetric in a li-subspace of M^. We define the nth data 
matrix: 



4 — 



t t t 

Vi V2 V^ 

Y Y 4" 



t 
V„ 

i 



Then S„ := {j^)A„Af, is the «th sample covariance matrix 
of our data set. Also, let v be a random variable with proba- 
bility measure jj.. Then S :— E[vv'^] is the covariance ma- 
trix of the distribution. A consequence of ji being spheri- 
cally symmetric in a (i-subspace is that after an appropriate 
rotation of space, S is diagonal with a fixed constant in d of 
its diagonal entries and in all other locations. We are trying 

** 1„ has a 1 in each coordinate where cr has a non-zero element, 



Proof of Property 1: Clearly, scaling all data vectors by and O's in all other coordinates. 
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to prove a result about empirical dimension, which is scale 
invariant and invariant under rotations of space. Because of 
these two properties we can assume that the appropriate ro- 
tation and scaUng has been done so that S is diagonal with 
value 1 in J diagonal entries and in all others. Without any 
loss of generality, we assume that the first d diagonal entries 
are the non-zero ones. 

Let <T„ = ((J„, 1 , (7„.2 ! • • ■ , Cn.o) ,n>D, denote the vector 
of singular values of the matrix A„. Our first task will be 
to show that ^ converges in probabiUty (as n ^- oo) to the 
vector: 



(1,1,...,1,0,...,0)' 



(15) 



To accomplish our task, we will first relate cr„ to the vec- 
tor of singular values of S„, and then use a result showing 
that Sn converges to U as n — > oo. 

It is clear that the vector of singular values of Sn, which 
we will denote by ijj, is given by: 



'^"«(^"'l'^"-2'-'^"^«)^- 



(16) 



Next, we will need the following result regarding covari- 
ance estimation. This is Corollary 5.50 of Ii22il , adapted to be 
consistent with our notation. 

Lemma 1 (Covariance Estimation): Consider a sub- 
Gaussian distribution in MP with covariance matrix U. Let 
ye (0, 1), andf > L If « >C(tlifD, then with probability 
at least l-2e"'"^, ||i7„-i7||2 < 7, where || • || 2 denotes the 
spectral norm (i.e., largest singular value of the matrix). The 
constant C depends only on the sub-Gaussian norm of the 
distribution. 

In our problem, we are applying this lemma to the dis- 
tribution [i. Let ye (0, 1) be given. If 



n > C{t/YfD, 



(17) 



then \\Sn - S\\2 < Y with probability at least 1 - 2e'' ^. 
The 2-norm of the difference of two matrices bounds the 
differences of their individual singular values. We will use 
the following result to make this precise: 

Lemma 2 13]: Let C7j(») denote the /th largest singular 
value of an arbitrary m-by-n matrix. Then: |(J,(B + E) — 
(J,'(B)| < ||E||2, for each/. 

Because S is diagonal with only values 1 and on the 
diagonal, the singular values of S are simply these diagonal 
values. We will use Itei-.d to denote the /'th singular value of 
S. 



Setting B = 17„ and E = S — S,,, in lemma 2 we get: 

||i:„-i:||2 < 7^ \{i/n)al,-hei:d\ < l|i:«-i:||2 < r, 

for each /. This implies that: 



e 



/[\/T^,yTT7], ifi<d; 



n l[o,Vr]' 



if / > d. 



(18) 



<7n.i 



Notice that as 7^ 0, -j- approaches l,ei:rf. Specifically, 
for any desired tolerance, Tj > 0, and any desired certainty, 
(^ , n can be chosen large enough that with probability greater 
than E,, 



ie\:d- 



< 77, simultaneously for each /. It fol- 
lows from this that the vector ^ converges in probability to 
( [T5] l as « ^- 00. 



Finally, c/^,, 



I '^" I 

\?JL\\ £ 

\^ T=e 



Thus, dg „ is a continuous function of the vector ^ . Hence, 
since ^ converges to l,ei:d as n — > 00, d^^n converges in 
probability to 



11(1, L..., 1,0,..., 0)|U 



de 



|(1,1,...,1,0,...,0)|| 



(A). 



1 1-e 

de-^ ^d. (19) 



8.2 Proof of Theorem 12] 

Recall that FlNat denotes the natural partition of the data set. 
First, we notice that GD{nNat) — ||('^i,'^2,---,'^A')||p, where 
dii is the true dimension of set k of the partition. Notice 
that df; cannot exceed d since jj.^ is supported by L^, a d- 
subspace. Furthermore, since {J.^^ does not concentrate mass 
on subspaces it is a probability event that all A',^ points from 
L^ exist in a proper subspace of L^t. Thus, for the natural par- 
tition, d/i is almost surely d, for each k. Hence, GD{nNai) is 
almost surely \\{d,d,...,d)\\p = {RdPf'^'^K'^IPd. 

Next, we will find a lower bound for the global dimen- 
sion of any non-natural partition of the data, and show that 
if p meets the hypothesis criteria, the lower bound we get 
is greater than K^'Pd = GD{nt^at)- To accomplish this we 
need the following lemma. 

Lemma 1 If FI ^ Tl^at then TI almost surely has one set 
with dimension at least d +1. 

Before proving the lemma, observe that a consequence is 
that if n 7^ riNai, then with probability 1: 



GD(iT)> IK?,.. .,?,fl'+l, ?,.... 
Then, from our hypothesis: 

p>ln{K)/{ln{d+l)-ln{d)) 

'd+lV 

>K 



\p>d+\. 



(20) 



^+1 >K^IPd. 



(21) 



A New Approach To Two- View Motion Segmentation Using Global Dimension Minimization 



15 



Hence, 

GD{n) >d+l> K^IPd = GD{nNat)- 



(22) 



Thus, if we show Lemma [T] the proof of the theorem fol- 
lows. To prove Lemma [T] we require an a simpler lemma: 



Lemma 2 If a set Q in TI has fewer than d points from a 
subspace Li, then either Q has dimension at least d + I or 
adding another point from Li to Q (an R.V. X with proba- 
bility measure lii, independent from all other samples) will 
almost surely increase the dimension of Q by L 

Proof If dim(2) < d then Q has dimension strictly less than 
the ambient space {MP). Observe that span(2) is a linear 
subspace of MP, which a.s. does not contain L,. We can- 
not have proper containment since dim(L,) = d > dim(2). 
Also, we have fewer than d points from L,- in Q, and each 
other point in Q hes in L,- with probability (All /i, do not 
concentrate mass on subspaces). Thus, span(g) a.s. does not 
equal L,-. 

Therefore, if we intersect L, with span(2) we get a 
proper subspace of L,; call it L. We note that /i,(L) = 
since /i, does not concentrate on subspaces. Thus, since X 
has probability measure jU,, X a.s. lies outside the intersec- 
tion of Li and span(2)- It follows that if we add X to Q, the 
dimension of Q a.s. increases by 1. 

n 



Hence, global dimension is a real-valued function of the 
matrix M. We will think of the membership matrix as be- 
ing vectorized, so that the domain of optimization can be 
thought of as a subset of M.^^. However, we will not explic- 
itly vectorize the membership matrix. Thus, when we talk 
about the gradient of global dimension, we are referring to 
another K-hy-N matrix, where the (fc,n)'th element is the 
derivative of global dimension w.rt. m^. 

To differentiate global dimension we must be able to dif- 
ferentiate the singular values of a matrix w.r.t. each element 
of that matrix. A treatment of this is available in ifTTl . 

To begin, recall the definition of GD: 



GD: 



dl 



{{dir+idi) 



FV^- 



{dirfr (23) 



We will denote the thin SVD (only D columns of U and V 
are used) of Aj^: 



A, = UkSkV/. 



(24) 



Now we prove Lemma [T] We will assume all sets in 17 
have dimension less than li + 1 and pursue a contradiction. 

By Lemma [2] the first d points from each L, almost surely 
increase the dimension of their respective sets by 1 . This can 
only be accomplished if all sets in FI have dimension equal 
to d. 

Now, since 77 y^ FIncii, some point in some L, does not lie 
in the same set of 77 as all other points in L,. It follows that 
the first d +1 points from this L, almost surely increase the 
dimension of their respective sets by 1 . We therefore have at 
least kd +1 points which a.s. inflate the dimension of their 
respective sets by 1. By the Pigeon-Hole Principal, some set 
in 77 has dimension at least d+l, contradicting our hypoth- 
esis. 

D 



Also, we will let a' refer to the (y,y)'th element of St. 
Then, using the chain rule: 



dGD _ dGD ddl dGD ddl 



dml ddl ^K dil dml 



dGD ddf 



From ( |23] l we can compute ^^ rather easily: 



^-^J-{{diY+{diY+...+{dfry-'p{diY- 

ddg P 



={d',r-'{{dir+{djr+...+{d§r) 



i-i 



(25) 



(26) 



8.3 Proof of Theorem 3 

Recall that the soft partition is stored in a membership ma- 
trix M. Specifically, the (A:,n)'th element of M, denoted 
w^, holds the "probability" that vector v,, belongs to cluster 
k. Thus, each column of M forms a probability vector. 



Next, we expand the other components of 



ddl 



D 



dd^da^ 



hdaidml' 



(27) 
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We now use the definition of d'^. to compute the first fac- 
tor of each term as follows: 



dd^ \\^'\\5£r({Kf + - + {cy-,rf' 



da\ 



w,\\i 



CT; k 



da\ 



(cTjy+...+K) 



S^l/5 



Ik/ 


\s{{o 


[y+- 


-.III 
••+(cTi,r)" 


^ H)"" 


1 


Ik; 


U{{(y 


)'+■ 


ik,ii| 


^(4- 


-1 








II Il7 

Ik-lls 







■ills. 



ICTiWsWi 



(Tilsla-j 



l-£ , 



1-5 , 



^£-l 



,6-1 



■'115, 



=C[{a^'^-'-C^{a^'-\ 



where 



C[ = 



k,-|li-1k,||5 



C-2^ 



1-5' 

5 



k,|li 



(28) 



(29) 



Next, we must evaluate the second factor in each term of 
( p7] i. Recall that aj- is the /th largest singular value of the 
matrix Aj. To achieve the next step, we must observe that 
each singular value of Ai depends, in general, on each ele- 
ment of the matrix Aj. We can then compute the derivative 
of each element of the matrix Aj w.r.t. each membership 
variable, m^. We will denote the (a,j3)'th element of the 
matrix A,- by A,v„ r). Using the chain rule: 






^n "la" '^Ala./^) 



(30) 



We are now in a position to work backwards and con- 
struct the partial derivative of GD w.rt. m^. In what follows, 
5ik is equal to 1 if ; = ^ and is otherwise (this is not to be 
confused with the un-subscripted 5, which is shorthand for 
£/(! — £)). Also, for notational convenience, we use Matlab 
notation to represent a row or column of a matrix (£(„.) and 

Bh„\, respectively). We fist compute da'Jdm'l as follows: 

D 

= L V i{a,j)y i{n.j) («n • e'a) Sit 
a=l 

Ui(i,j)Vn„,j) 



Un2.j)Vn„j) 



•««5,i = V;(„j) 



UnD.j)Vn„j) 

Vn„,i){Un,jyV„)Sit. 



'^'■(IJ) 

Un2.j) 
Ui(D.j) 



■v„Sii- 



(33) 



Then from d27b, we get 



ddl, _ S. ddi da] 
Now we can write: 



(34) 



dd: 



D 



C2fl(cTJ)'"V,(„,)(t/,(,,).^„)5,, 



(35) 



We now simplify the components of (|35J- After some 
manipulation, and using the notation 



{^d 



£-1 



(^1 



ve-1 





■■. 

(di,) 



£-1 



(36) 



A powerful result ifTTl eqn. 7] allows us to express the 
partial derivative of each singular value, cj', w.r.t. a given 
matrix element in terms of the already-known S VD of Af. 



da\ 



^^i{a,l3) 



Ui(a,j)Vi{pjy 



(31) 



The second factor in each term of ( [30| l can be evaluated 
directly from the definition of A^: 



aA,(„/3) [o, if n^ poT if i^k; 

]v„-ea, if n — p and / = k. 



dm1_ 



(32) 



we can wnte 
" x.-i. 






(ctJT ^Vi(„^iy...,{a'Df V,(„,B) 



Ui(:,2) ■ f,, 



Ui(;,D)-'"n 



Cil) 



Similarly, we can simplify part of the second term of ( |35| ): 



where e^ denotes the a'th standard basis vector (1 in posi- 
tion a and O's everywhere else). 



,5-1 . 
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(38) 



Substituting ([37]i and ([38]l into (|35]) we get 



d4 



C[{Vi^„,)iS.Y-'iU,fv„ 



C2{V,„,){S,f-'{Ufv„ 



With this expression we are ready to evaluate 
lows: 

dGD _ ^ dGD ddi 



as fol- 



k ,=1 '54 9ml 



=l^{di)"-'{(dir+...+{djry-'5,,- 
1=1 

=(rf*)''-'((i^r + ... + (if)'')^-'. 
=(4y-'\\{4,...,d^)\\l-''- 



k{n.:] 



Hdiy-'W {dl,-,dj) \\],-"V,(„,^D,{U,fv„, 

where 

D,= (c\iE,r-'-CiiE,)' 

We re-write ( |39| ) as follows: 



(39) 



(40) 



V,:) ((4)""' II {dl,.:,dj) \\'-PD,{Ukf) A^,„^ 



(41) 



8.4 Experiment Setup 

For our comparison on the outlier-free RAS database, we 
include the following methods: GDM, SCC 0, MAPA J!6|, 
Sparse Subspace Clustering (SSC) |9|, Spectral Local Best- 
fit Flats (SLBF) L25 1, Low-Rank Representation (LRR) 1 14), 
RAS im, and HOSC |8 1. Each algorithm was run 10 times 
on each file in the database and the median misclassifi- 
cation rates per file were aggregated for this comparison. 
GDM was run with «i = 10. The other parameters (e and 
p) are fixed throughout all experiments and are addressed 
earlier. SCC was run with d — 3 for the linearly embed- 
ded data (this was found to give the best results), and d — 7 
for the nonlinearly embedded data (as recommended in ||4|). 
MAPA was run without any special parameters. SLBF was 
run with d = 3 for the linearly embedded data and d = 6 
for the nonlinearly embedded data and a was set to 20,000 
for both cases (d and a were selected by trial and error to 
give the best results). LRR was run with A = 100 for the 
linear case and X = 10000 for the non-Unear case (these 



seemed to give the best results). RAS proved rather sensi- 
tive to its main parameter ("angleTolerance"), and no single 
value gave good across-the-board results. We ran with all de- 
fault parameters and many other combinations. The results 
presented were generated using angleTolerance = 0.22 and 
boundary Threshold — 5, as this combination gave the best 
results from our tests (better than the algorithms defaults). 
HOSC was run with Tj automatically selected by the algo- 
rithm from the range [0.0001,0.1]. The parameter "knn" was 
set to 20, and the default "heat" kernel was chosen. The al- 
gorithm was tried with d set to 2 and 3. Both of these cases 
are presented, d — 2 gave better results, but the authors of 
HOSC argue for using d = 3 in this setting. 

For our comparison on the outlier-free Hopkins 155 
database, the algorithms that were selected for the com- 
parison were run once on each of the 155 data files. The 
mean and median performance for each category is reported. 
GDM was run with n\ =30 to improve reliability. All 
other parameters were left fixed, and (as before) the non- 
linearly embedded data was used. Each competing 2-view 
method was run on the non-linearly embedded data with the 
same parameters that gave the best performance on the RAS 
database. The competing n-view methods have their param- 
eters given in the results tables. 

For our outlier comparison on the corrupted RAS 
database, we ran GDM with ni = 30 (same as for the Hop- 
kins 155 database). For the naive approach, we used a = 
0.02. For "GDM - Known Fraction" we rejected 20% of the 
dataset. For "GDM - Model Reassign" we used K = 0.05. 
"GDM - Classic" was the same algorithm as in the outlier- 
free comparisons and so had no extra parameters. RAS was 
run with angleTolerance = 0.22 and boundary Threshold = 5 
(same as in the outlier-free tests). We ran LRR with A =0.1, 
and outlierThreshold = 0.138 (these gave the best results 
of the combinations we tried). HOSC was run with d = 2 
(which gave the best results in the outlier-free case) and 
a = 0.11. 

The code for GDM can be found on our supplemental 
webpage. For each of the algorithms used in our compar- 
isons, we have made an effort to provide (on the supplemen- 
tal webpage) the code or a link to where the code can be 
found. 
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